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ABSTRACT. We address the problem of bounding below the probability of error under maximum likelihood 
decoding of a binary code with a known distance distribution used on a binary symmetric channel. An improved 
upper bound is given for the maximum attainable exponent of this probability (the reliability function of the 
channel). In particular, we prove that the "random coding exponent" is the true value of the channel reliability 
| for codes rate R in some interval immediately below the critical rate of the channel. An analogous result is 

obtained for the Gaussian channel. 
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, 1. INTRODUCTION 

Optimizing P e (C,p) over all codes of a given rate R has received much attention in information and 
coding theory. It is known that for the best possible codes this probability declines as an exponential function 
of the code length. Let us define the largest attainable exponent of the error probability 

O ' 11 

E(R,p) = limsup — log max 



o 

O 



n cc{o,i} n ,R(C)=R Pe{C,p) 
j> ■ also called the error exponent or the reliability of the channel. The problem of bounding the function E[R, p) 

for the binary symmetric and other communication channels was one of the central problems of information 
theory in its first decades. In particular, the standard textbooks El E3 EI EH all devote considerable 
attention to properties and bounds for channel reliability. There are a variety of methods for deriving upper 
and lower estimates of E(R,p). The most successful approaches to lower bounds are averaging over a 
suitably chosen ensemble of codes (for instance, all binary codes or all linear codes) [ 14] and relying on the 
distance distribution of an average code in a code ensemble [13], [24]. Recently the distance distribution 
approach was the subject of several papers because of the renewed interest to performance estimates of 
specific code families (rather than ensemble average estimates). 

The problem of upper bounds on the error exponent E(R, p) also has a long history. Several important 
ideas in this problem were suggested in the paper E71 . The nature of the upper bounds is different for 
low values of R and for R close to capacity. For low code rates paper 1271 suggested to bound the error 
probability below by the probability of making an error to a closest neighbor of the transmitted codeword. 

1.1. Notation and previous results. Since our main result is a new bound on the error exponent E(R,p), 
in this section we overview the known bounds on this function. It should be noted that the method below 
applies to the analysis of any code sequence for which the distance distribution is known or can be estimated. 

For notational convenience we shall write dij for the Hamming distance between two codewords x\ and 
Xj. We shall write d% v for the distance between a code word xi and an arbitrary word y. Let B l w = \{x £ 
C : di x = w}\ and let B w = ^ B l w /M be the local and average distance distributions of the code C of 
size M. 
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Let h(x) be the binary entropy and h (x) its inverse function. Denote by 5gv(R) := h 1 (1 — R) the 
relative Gilbert- Varshamov distance corresponding to R and by 

x 1 — x 

D(x\\y) = xlog — h (1 — x) log ■ 



y i - v 

the information divergence between two binomial distributions (the base of logarithms is 2 throughout). Let 



(1) A(u) :=wlog2Vp(l-p), 



(p(x) = h( 1 /2 — y/x(l — z)). Throughout w = ujn, I = An and d = 5n. Let [n] = {1, 2, . . . , n}. 
For a given p, define 

P = pip) = ■= z^h =• 

The function 

E sv (R,p) = D(5 GW (R)\\p) 

is called the sphere packing exponent; it gives an upper bound on E(R,p) which is valid for all code rates 
R £ [0, 1 — h(p)} and tight for code rates R > R cr i t , where the value i? cr j t = 1 — h(p) is called the critical 
rate of the channel. For low rates the best known results for a long time were given by the following theorem. 

Theorem 1. 

(2) -A(5 GY (R)) < E(R,p) < -A{6). 

Here the lower bound is Gallager's "expurgation exponent" [ 13 1 obtained for instance for a sequence of 
linear codes whose minimum distance meets the Gilbert- Varshamov bound. The upper bound in © is due 
to ll22l . It is obtained by substituting the result of \23\ into the "minimum-distance bound" of 1271 . The 
function 5 = 5^p(R) is the linear programming bound of [23 1 on the relative distance of codes of rate R 
defined as 

5 := min G(a, r) 

0<a<i 

where G(a, r) = ^ Q [ 1 ^ 2 '^~^ 1 ~^ > an d where r satisfies h(r) = h(a) — 1 + R. Note that Theorem^implies 

that#(0,p) = -4(1/2). 

Let 

T»(0 ■■= \(i - V 1 -4(^0^7) -ai-O-t) 2 )- 

Let R(S) be the inverse function of 5(R), 

R(5) = 1 + min (h(r a (d/2)) - h{a)). 

(V2)(l-v / T :r 25)<a<i/2 

Derivation of improved upper bounds on E(R, p) is based on the following inequality for the error prob- 
ability P e (xi) conditioned on transmission of the codeword X{. For every j ^ i let 

Xij C {y G X : d jy < d iy } 

be an arbitrary subset. Let C C C be an arbitrary subcode of C such that Xj C. Then 

(3) P e (xi)> Yl P »(^ nX '4 

XjGC x k EC'\{xj} 

Let us take C to be the set of codeword neighbors of Xi at distance w from it. We have, for any w, 

P e { Xi ) > B l w Fi (X^) [l - (B w - l)Fi (XalXij) ] }, 
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where Xj, Xk are any codewords such this dy = dj,j = w, djj, = d, where d is the code's minimum distance, 
and [a]+ = max(a, 0). Summing both sides of the last inequality on i from 1 to M, we obtain the estimate 
of P e (C) in the form 

(4) P e (C) > max { B w Ti (Xy ) X [l - {B w - 1)P* (X ik |X y ) ] } , 

Recall from [ 27 1 that a straight-line segment that connects a point on E sp (R',p) with a point on any other 
upper bound on E(R,p),R < R' is also a valid upper bound on E(R,p). This result is called the straight- 
line principle. It is usually applied in situation when there is a U-convex upper bound on E(R, p) and results 
into the straight-line segment given by the common tangent to this bound and the curve E sp (R,p). 

The RESULTS OF [21 1. The upper bound in (0 was improved in [21 1 by relying on estimates of the 
distance distribution of the code. The proof in fTF\ is composed of two steps. The first part is bounding the 
distance distribution of codes by a new application of the linear programming method (similar ideas were 
independently developed in Q)- The second step is using © to derive a bound on the error exponent. The 
estimate of the distance distribution of codes of 1211 has the following form. 

Theorem 2. [21 1 For any family of codes of sufficiently large length and rate R, any a £ [0,1/2] and 
any r that satisfies < /i(r) < h{a) — 1 + R, there exists a value uj,0 < u < G(a,r) such that 
n^ 1 log B un > fi(R, a, uj) — o(l), where 

'a-u/2" 



(5) u(R,a,u) = R-l + h(r) + 2h(a) - 2q(a,T,u/2) - u- (1 - u)h[ 

\ 1 — u> 

and where 



(6) q(a,T,u) = h(r) + dy log 

J o 



P + ^P 2 - AQy 2 



2Q 

where P = a(l — a) — r(l — t) — y(l — 2y), Q = (a — y)(l — a — y), is the exponent of the Hahn 
polynomial H"™(um). 

The bound on E(R,p) in [21 ] has the following form. 

Theorem 3. 

(7) E(R,p) < min max_ max N 

a , T 0<5<5 5<uj<G(a,T) 

where 

(8) N = mm{-A(6), - mm(p,(R, a, u), - B(u, 5)) - A(u)}, 
< t < h- l (h(a) - 1 + R), < a < 1/2; A(w) is defined in Q, 

(9) B{ui, A) = -u) - (1 - ui)h(p) + max ( Xh( ^ ) + (to - A/2)// U ~ 2v 



??e[^,mm(A,p(l-a;))P V A / \2uJ - X 

Remark. In ITil. optimization in Q involves taking a maximum on a and r. However, Theorem 
is valid for any a £ [0, 1/2], r G [0, h^ 1 (h(a) — 1 + R)], and therefore, a better bound is generally 
obtained by taking a minimum rather than a maximum. Throughout the rest of the paper we will assume 
that h(r) = h{a) — 1 + R. This assumption simplifies the analysis somewhat and does not seem to affect 
the final answer. 



Analysis of the inequality © together with some additional ideas gives rise to Theorem [5] and its im- 
provements. We begin with deriving a simplified form of the bound for low rates R. 
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1.2. A study of the bound 0. By omitting the term A(5) in the expression for ./V can be written as 

N = max{-fi(R, a, u) - A(u), B(oj, 5) - A(u)}. 

As will be seen below, for low rates R, the first term under the maximum is the greater one. For this reason 
we begin with the study of the first term for low rates. Since this term does not depend on 5, we have 

max max (—p — A(oj)) < max (—p — A(uj)) 

0<5<$ 5<uj<G(a,T) 0<w<G{a,T) 

Lemma 4. Let p > 0.037, < R < tp^), where 5 1 = 2p(l - p). Then 

(11) max (-pi-A(u)) = -A(S)-R + l-h(S). 

0<u}<G(a,r) 

Proof. In the expression —p,(R, a, uj) — A(u>) let us take a equal to the value that furnishes the minimum in 
the definition of 5. Under the assumptions of the lemma, R < 0.303. In this case, it is known that a = 1/2 
and the expression q(a, t,u>/2) simplifies as follows. The integral in © upon a substitution a = ^, 2y = z 
takes the form 



«/2 P+ ^ P 2_ AQy 2 

log ay 

B 2Q y 



(1 - 2r) 2 + - 2r) 2 ((l - 2r) 2 - 4z(l - z)) - 2z(l 

2(1 -zf 



dz 



Let 



' l-2r+ V(1-2t) 2 -4z(1-z) _ j 

log 20^7) dz - 



i , 1 - 2r + V(l - 2r) 2 - 4z(l - z) , 
k(r, u) = h(r) + / log V K 2(1 _ ' L dz. 



It is known |16] that in the region < u < (1/2) — -\/t(1 — t), this function gives the exponent of the 
Krawtchouk polynomial K Tn {ojn), i.e., 

\ogK Tn (ujn) = n(k(r,u)) + o(l)). 

Therefore, we obtain the identity q(l/2, r, lo/2) = k(r, to). Substituting this in p, we obtain the following 



-p - A{uj) = -2h(r) + 2k(r, u) - uj log y/4p(l-p). 
Let g(u) = Sj(—p- — A(uj)). From the equation g{uj) = we find that the maximizing argument u satisfies 



1 - 2r - 2Vu(l -uj) = - V / (1-2t) 2 -4u;(1-w) 
where u = '4p(l — p). This equation has a real zero if 

1 - 2r 

u < u := 1 

2^l 

and then the maximizing argument is 

w*(r) = — — t= 1 



1 + -y/u V 1 - y/u 



Recall that < uj < G(}/2, r) = \ - ^Jt{\ - r). We shall show that 
(12) arg max {-a - A(to)) = G(l/2, r). 

0<w<G(l/2,r) 

There are two cases. 
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(i). Let R = <p(5i). In this case the stationary point is exactly at the right end of the interval, i.e., 
u;*(t) = \ — y/r(l — t). To show this, compute 

u 



5 X = 2p(l - p) 



1 + u 



h-\R) = \ - VW=Kj - (1 ~ ^ 



2 v ±v L ' 2(1 + it) ' 
and substituting this into uj* , uj we find 

u*(t) = r 1 1 - — — — ) = 6i = u. 
1 + V 1+u / 

(ii). Now consider code rates < R < ip(5i). Observe that r = h^ 1 (R) decreases as R decreases, and 
therefore u> also decreases with R. On the other hand cj*(t) increases as r falls, so in this case uj < w*, and 
5(0;) has no zeros for < uj < G( 1 /2,r). It is positive throughout because g(0) > 0. This again proves 

Hence, — p — A{u) increases on uj for all oj G [0, G], attaining the maximum at the right end of this 
segment. Substituting uj = (7(1/2, r) into this expression, we obtain the claim of the lemma. □ 

For R > 0.305 the minimum in the definition of 5 is given by some a < 1/2. Fixing a equal to this value 
we observe that the function p depends only on u. Therefore, the behavior of the function —p(R, a, uj) — 
A(uj) can be studied numerically (for instance, using Mathematica). We observe that this function increases 
on uj for uj < S(R) as long as R < R(5i). For R = R(5i), the maximum of —p(R, a, uj) — A(uj) on uj is 
attained for uj = 5 = S\. Substituting uj = 5 into p, we again arrive at the expression (fTTT) . 

To summarize, the bound © implies the following: let R < R(S\), then 

(13) E(R,p) < max! - A(S) -R + l- h(5), max(-B(uj,5) - A(uj))\. 

L 5,0} ) 

Next we show that for low code rates the maximum in this expression is given by the term —A(5) — R + 
1 — h(5). This is difficult to verify analytically because of the complicated form of the term B; however this 
can be verified numerically for any given value of the probability p. More precisely, there exists a value of 
the rate R = Rq, a function of p, such that for < R < i?o, the first term is (fT3l is greater than the second 
one. 

As a result, we obtain the following proposition. 
Proposition 5. Let R{8\) < Rq. Then 

(14) E(R, p) < -A(S) -R+l- h(S) < R < R 



(15) E(R,p) < max max (B(u,S) - A(lo)) R < R. 

0<6<5 5<u<5 

The example of p = 0.01 is shown in Fig. [2 

Some comments are in order. The first term on the right in (|3} is the "reverse union bound" which suggests 
to estimate the error rate P e (xi) by a sum of pairwise error probabilities. An interesting fact is that for large 
n and for certain values of R and p the union bound argument gives the correct value of the error exponent. 
From dl4l we can see this and more, namely that for large n and code rates below Rq, the error exponent is 
given by the sum of pairwise probabilities of incorrect decoding to a codeword at the minimum distance of 
the code C from the transmitted codeword. (Note that the relative minimum distance of C is bounded above 
by 6.) The improvement of (fl4l over the upper bound in (0 is in that it takes into account decoding errors 
to all exp(n(i? — 1 + h(5))) neighbors of the transmitted vector as opposed to just one such neighbor in ©. 
The main question addressed below is to determine the range of code rates where the union bound and (fl4l 
is true and to refine the inequality (|3} for those rates where the union bound does not apply. 
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In general terms, the answer to this question for large n is given by ©. The bound P e (C) > B w Fi (X^) 
is valid as long as 

(16) BJ, (X ik n Xij) < Pi (Xij) . 

In our analysis we use the estimation method of 1 6 1 - 1 7 1 which was originally developed for codes on the 
sphere in W 1 . Below we modify it for use in the Hamming space and improve the estimate Q. The analysis 
of the relation between the distance distribution and P e (C,p) for the Hamming space turns out to be more 
difficult than for M. n . One of the issues to be addressed is the choice of decision regions in the estimation 
process. We suggest one choice which while still being tractable leads to improving the estimates. 

The results of the present paper are twofold: first, we expand the applicability limits of the bound <f]~4l> . 
Outside these limits we will derive a bound on E(R, p) which is better than the result obtained from Theorem 

13 

2. A New Bound 

2.1. Statement of the result. Let us state a lower bound for the error probability of max-likelihood decod- 
ing of an arbitrary sequence of codes with a given distance distribution. 

Theorem 6. Let (Ci)i>\ be a sequence of codes with rate R, relative distance 5 and distance distribution 
satisfying B wn > 2 n ^~°^ n \ where (j(oj) > for all 5 < uj < 1. The error probability of max-likelihood 
decoding of these codes satisfies P e (C,p) > 2~ En ~°^ n \ where 

(17) E= min max [max(-/3(u;) - A(u), B(u, X) - A(X))] 

5<w<l 5<\<lj 

where A and B are defined as in Equations © and © respectively. 

Theorem[6]will be proved later in this section. We first discuss its application to the problem of bounding 
E(R,p). Let us specify this theorem for the distance distribution defined by Theorem[2] Let a, r, G(a, r) 
have the same meaning as in (0. Recall that by Theorem |2] for any family of codes of rate R and every 
a G [0, 1/2] there exists an uj, < uj < G(a, r) such that the average number of neighbors at distance um 
can be bounded as B M > 2 n ^ R > a ^)-°( n ) . Let us substitute this distance distribution in dl7t and perform 
optimization. By Lemma 0] and the argument after it, for low values of R we conclude that the function 
E(R,p) is bounded above by dill . Let R* Q be the value of the rate, a function of p, for which the maximum 
shifts from the first term in (fTTt to the second one. As in the previous section, we arrive at the following 
theorem. 

Theorem 7. Let R(8 X ) < R* Q . Then 

(18) E{R,p) < -A(5)-R + l-h{5) < R < R* 

(19) E(R, p) < max max B(uj, X) - A(X) R* < R, 

0<A<5 \<lu<6 

where A and B are defined as in Equations (0 and ^ respectively. 

Example. (Explanation of Fig.0 To show that (fl7T i improves over (0, let p = 0.01. Then from dT4T >- (fT3T > 
we obtain R r* 0.271. From (dJ we find that the bound CQ} is valid for R < R^ p» 0.388. Note also 
that R crA = 0.559, R(5i) = 0.537. See Figuref^for a graph of the known error bounds including our new 
bounds. In the figure, curve (a) is a combination of the best lower bounds on the error exponent. Curve (b) 
is the union bound of (114b . ill 8b . Curve (c) is the upper bound (fT5l given by Theorem |3j Prop. |5] Curve (d) 
is the upper bound (El given by Theorem[6] Curve (e) is the sphere-packing bound E sp (R,p). 

The improvement of Theorem |6] over Theorem |3] is in the extended region where the union bound (a) is 
applicable and in a better bound for greater values of the rate R. 
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Figure 1. Bounds on the error exponent for the BSC with p = 0.01. Notation explained 
in the text. 

Note that E^ p {R,p) is better than (b) from R 0.422; the straight-line bound (not shown) further 
improves the results. 

Another set of examples together with some implications of Theorems 1617 1 will be given in Sect. [5J 

Remark. Experience leads us to believe that the maximums in the equation are achieved for u> = A = 5 
which would give us the bound 



However this has proved too difficult to verify analytically due to the cubic condition for r] in the maximiza- 
tion term in the definition of B(lo, A) and other computational problems. 

2.2. Preview of the proof. The basic idea of the estimation method is from [7| although we make some 
modifications due to the fact that the observation space is discrete. To prove this theorem we start by 
choosing a collection of sets {Yij}, each corresponding to a pair of codewords (x-i,Xj), such that Yij is 
outside the decoding region of x-i and 



One of the main questions in applying this inequality and further ideas of Q is the choice of the sets Y^. 
We construct the Yy's via sets Xy C F?; , where 



See FigureEJfor an illustration of the bounding process. To create the Y^'s from the Xy's we randomly 
"prune" these sets so that the disjointness condition is satisfied. To accomplish this pruning we define a set 
of codewords Tj = {xj : d{j = w} for each codeword X{. Then, as in Q, for each x i; we randomly index 
by Sij all the codewords Xj that are a distance w from X{. Define sets 




Y i:j n Y ik = for all k ^ j. 
Then we can bound the error probability in terms of these sets using the following inequality 



P ^mH E (w = l,2,...,n). 



i=l j:djj=w 



Xij = {y G F n : d iy = dj y = +p(n - dij)}. 



T(i,j) ={k G Ti : s ik < s^}. 
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(a) (b) (c) (d) 

FIGURE 2. The Bounding Process, (a) A codeword Xj, neighboring codewords and the 
Voronoi region D(x{). (b) We restrict our attention to only those neighbors that are a 
distance w away. By only worrying that the received word y is closer to this subset of the 
neighbors we upper bound Pj (D(x{)). (c) For each neighbor Xj still under consideration, 
let Xij be some set of words that are closer to Xj than they are to X{. (d) We "prune" the 
Xi/s to construct disjoint Yjj's with the required properties. 

We then get our Yy 's as follows 

Yij = X^ \ [UkeT(i,j)Xik]- 
These Y^ satisfy the disjointness condition: assume there exists x G lj m n Y#. Then x G Xi m and 
x & UfceT(i m) Xik gives that su > Sj m . However we also have x G X$ and x G" UfceT(« i) an d this 
gives that Sj m > su which is a contradiction. 

Instead of calculating Pj (Yij) directly we apply a "reverse union bound" to get 

(20) F i (Y ij )>F i (X ij )(l-K ij ), 

where Kij = X^fceT(i j) ^ (Xik\Xij). Note that this inequality is the bound (|3} with our particular choice of 
X^ , Y^ . Using the last inequality we perform a recursive procedure which shows the existence of a subcode 
C C C with large error probability (among the codewords of C). This gives the claimed lower bound on 

Pe(C,p). 

2.3. A proof of Theorem|6| The error probability for two codewords is given by the following well-known 
lemma. 

Lemma 8. For all codewords x% and Xj that are a distance w apart linin^oo - log Pi(Xij) = A(u>), where 
A(uj) is defined in Q. 

Lemma 9. For all codewords Xi, Xj and xt such that dij = dik = w and djk = I we have 

lim - logPj (X lk \Xij) = B(u, A) 

n— >oo n 

where B(u,X) is defined in Eq. ©. 
Proof. First consider 

P, (x it n Xij ) = Y. { ' ) x ( n ' ) ( T \ ' ) 

\m J \w/2 — mJ \p{n — w) — m J 
x pW /2+p(n-w) _ p\n-w/2-p(n-w) 

Then since 

logp, (x ik \x tj ) = log Pi (x tk n Xy) - log Pi (Xij) , 

substituting for Pj (Xij) from the previous lemma and taking the appropriate limits gives the required result. 

□ 
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The following properties of B(u, A) can be verified numerically. 

Lemma 10. Ifw < A < 2uj then B(u, A) < B{uj,uj). If X < uj then B(X, A) < B(u, A) 

Recall that the indexing of pairs to create the sets T(i,j) is done randomly. By linearity of expectation 
there exists an indexing such that 

1 M 

t=l j:dij=w 

This equation will be the basis for our new bound on the error exponent but before deriving this bound 
we have two final preliminaries. Firstly we will refer to all codewords Xj that are a distance w from X{ as 
^-neighbors of xi. (Recall that we defined B l w to be the number of codewords in the ^-neighborhood of 
X{.) Secondly we shall say that a subset S' C S of codewords is of substantial size (with respect to S) if its 
size has the same exponential order as the size of S. Note that for a family of codes (Cj)j>i where Cj has 
length n and rate R, we can consider (Cj)j>i, a family of codes where C[ is a substantially sized subcode 
of Cj, when trying to bound the error exponent since 

lim RfCi) = lim R(d) = R 

n^oo n— >oo 

and 

1, 1 , 1, 1 

km sup - log ——, — - > hm sup - log — — — - . 

n -,oo n P e (C7 i ,p) ,woo n P e {Ci,p) 

We now proceed with a case analysis dependent on the values of Kij. Roughly speaking when Kij is 
typically less than a half, a union bound argument will be used to bound the error probability. When is 
typically larger than a half, a more complicated analysis will be required. Before we describe the two cases 
in our analysis we need the following two lemmas. 

Lemma 11. [ 8 1 Suppose that there are L balls of K different colors. The number of balls of a color k is r^. 
We are also given numbers n^, 1 < k < K. Suppose that all balls are enumerated randomly by different 
integers from 1 up to L. Let r be a random integer between 1 and L and let be the number of balls of 
color k with numbers between 1 and r. Then 

Pfit. < nu, k = 1, . . .K) > minfl, — min — X. 

v ' ' ; ~ 14 \<k<K r k i 

Recall that, for a given pair, Kij is a random variable. We then can prove the following lemma: 

Lemma 12. Let dij = con. With respect to the random indexing of all the (i,k) pairs (where X}~ is any 
codeword such that dik = ojn) we have 

/ ]\ , 2- nB ( u) ' X )~°( n ) -i 

P [Kij < - > min <^ 1, min ^ } 

V 3 ~ 2 J ~ 1 '6A mm{B* w ,Bl} I 

where A = {I e [n] : \R W: i\ > N W)l }, R w j = {x k £ C : = d ik = w, d jk = 1} and N W) i = 2 ^{n+i^ • 
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Proof. 

¥{Kij < 1/2) = P [ ^{X lk \X lj )<\/2 

\*6T(ij) 

j2 ^ nB(uj ' x) < 1/2 

J=0 k&T(i,j),d jk =l 

= V^\T(i,j)nR w , l \2 nB ^ <l/2j 

>P(\T(i,j)nR Wtl \ < N Wi i VI G A). 

Let there be a ball for each codeword in |J ; R w j- Consider a ball from R w ^ to have color /. Let n\ = N w \ 

and m = \x m G Rwj : s im < Sij\. We have 

P (*Ty < 1/2) > P (/x/ <niVieA). 

By the previous lemma we have 

< nj V/ G A) > -min ? '''' 



4 «ga LR 



w,l I 



if the right-hand side is less than one. The lemma then follows from the fact that \R w j\ < min{i?^, Bj}. 

□ 



In the analysis that leads to Theorem 6, we face a dichotomy of a relatively sparse ^-neighborhood of 
the transmitted vector Xi when the union bound is asymptotically tight, and a cluttered neighborhood when 
is not. These two cases correspond to the first and the second terms in dl7t . respectively. When the union 
bound analysis is not applicable, we will rely crucially on the following lemma. 

Lemma 13. If Kij > 1/2 for some i,j such that dij = ton then there exists a nonempty set Aij such that 
for all A G A^, 

mm{Bi,B{j > 2 - nB ^-°( n \ 

Proof. Consider a pair of codewords and Xj such that Kij > 1/2. We deduce that P (Kij < 1/2) < 1 
since the event {K^ > 1/2} occurred. Therefore, by Lemma[T2l there exists a A such that, 

2~ tlB{lo,X)— o(n) 
— < 1. 

^{Bl,,B{ n } 

□ 

Given a pair of codewords Xj, Xj with Ky < 1/2 we put Ay = 0; otherwise, we assume that Ay contains 
all the values of A = l/n whose existence is established in the previous lemma. We now define, for all n 
possible values of I = An, the sets 

Gi,w = {xj ■ 3x, such that K^ > 1/2 and l/n G Ay}. 

In words, for a given I, the set G\ >w C C contains all the codewords Xj that have a ui-neighbor Xi such that 
the set Ay contains the value A = l/n. Let Hi w be defined as the set of all x-i G C such that a substantial 
number of the ^-neighbors Xj of Xi satisfy Kij > V 2 an d l/n G Ay. Note that the "substantial number" 
here is in relation to B % w . 

We say A = l/n is a "nuisance level" for u> if H^ w and Gi :W are both substantially sized subcodes of 
C. The two cases in the following analysis correspond to whether or not a nuisance level exists. The next 
theorem bounds the error probability in the case that it does not exist. 
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Theorem 14. Consider any code C of sufficiently large length n and rate R. Assume that for some uj and 
bounding function f we have ^ log B^ > f{uj)for all i. If there does not exist a nuisance level for uj then 

^ og ^yb) £ - /M -• 4M + <>(1, • 



Proof. Let us define the sets 

51 = {I : Hi :W is not a substantially sized sub-code}, 

52 = {I ■ G7 u, is not a substantially sized sub-code}. 

Since w does not have a nuisance level, S± U S2 = [n]. Without loss of generality we may assume that 
Gi >w = for all I G S2 since removing Uzes 2 ^* 1 > W Y^^s a substantially sized subcode. Hence also 
Hi,w = f° r a ll I S S2. Now consider only transmitting the codewords in C = C \ U;e[n] Hl,v> anc ^ 
note that this is a substantially sized number of codewords since neither UzeSi Hl,w nor UzeS2 ^l,w w& 
substantially sized. For each of these codewords we know that i log B l un > f(oj). Hence 

1 - 

Pe{c, P )>-Y. E p *cn*) 

8=1 j:dij=w 

>i min (BlF t (X tJ )) 

Z i,j:dij=w 
> 2n(A(w)+/(w))-o(n)_ 

The second inequality follows from the fact that for each X{ G C", a substantial number of u>-neighbors 
Xj are such that < 1/2, and the third one is implied by d20l since Pj (Xij) — (Xij) /2 whenever 
Kij < 1/2. □ 

We now bound the error probability (and ensure another property of the distance distribution) in the case 
that there exists a nuisance level. 

Theorem 15. Consider any code C of sufficiently large length n and rate R and an uj £ [0, 1]. Let X be a 
nuisance level for uj. The subset of codewords xj G C such that 

\{x k G C : d(x 3 ,x k ) = Xn}\ > 2 -»s(".*)-°W 

forms a substantially sized subcode. Furthermore, 



Proof. Since G\ w is substantially sized, it follows by LemmaPHlthat a substantial number of codewords xj 
have at least 2~ n - B ( w > A )~°( n ) neighbors at a relative distance A. Now consider Xi G H^ w . By definition, there 
is a substantially sized subset N(i) of the wn-neighbors of xi such that A G A^- for all Xj G N(i). Hence, 
appealing to Lemma[T2l for each xj G N(i), 

<2—nB(uj,\)—o(n) 

F {Kij < 1/2) > . 
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Now 

E(¥ i (Y ij ))=E[l Ki ^ h F i (Y ij )) +E[I K iFi (Y tJ ] 



>—f(k v <- 



and so, by the above discussion and Eq. <l2TT> . we get 

e ecp*to) 



M 



^ _ \ r 2 n ^( w ) 2 _?1 -S( w i-^) — °( n ) 
_ M 

_ 2n(A(ui)-B(uj ! \))-o(n) 



□ 



Proof of Theorem® Let C be the code from the statement of the theorem. Let 

F = - log ■ ' 



As discussed in [2], [7], for any w = uju, S < uj < 1, the code C contains a subcode C of size M' > M/n 2 
such that for all codewords Xi in this subcode 

-log^L > f3(uj)-o(n). 
n 

Since the subcode is substantially sized we may now consider this subcode as our new code. 

For a fixed uj construct Yij,Xij and Kij for all pairs with dij = uju. By Theorems IT4l and [T31 we 
get 

—f}(uj) — A(uj) if no nuisance level exists for uj 

B(uj, Ai) — A(uj) if a nuisance level Ai exists for uj. 



F < 
Hence we get 



F < max{-/3(u;), B(u, Ai)} - A(u>). 
Now if Ai > uj then B(uj, Ai) < B(uj,uj) and so we get 
(22) F < max{— (3 (uj),B(uj,uj)} - A(uj). 

If Ai < uj then we use the fact from Theorem [R] that for a substantial number of codewords Xj, B\^ n > 
2-n-B(o;,Ai) now constmc t new Yij , Xij and for all (i, j) pairs with d, t j = Ain. Hence by Theorems 
HU and [15] we get 

B(uj, Ai) — ^4(Ai) if no nuisance level exists for Ai 
B(X±, A2) — ^4(Ai) if a nuisance level A2 exists for Ai. 



F < 
Hence we get 



F < max{B{uj, A x ), B{\ u A 2 )} - A(Ai). 
If A 2 > Ai thenS(Ai,A2) < #(Ai,Ai) < B(u,X{) then 

F < B(w,Ai) - A(Ai). 

If A2 < Ai then we use the fact that for a substantial number of codewords Xi, B\ n > 2~ nB( - Xl ' X ^ and 
continue as before. 
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We continue in this manner and get a sequence u > X\ > A2 . . . such that at step i we get the bound 

F < max{B(A;_i, A,), B{Xi, A m )} - A(Xi). 

This process terminates after at most n steps since there are only n possible values for the nuisance level. At 
the last step, i = f, the nuisance level Xf+i, if it even exists, is not less than Xf itself and therefore we have 

F < majt{S(A / _i, X f ), B(X f , A /+1 )} - A(X f ) 
< max{ J B(A / _i, A/), B(X f , X f )} - A(X f ) 
<B(u,X f )-A(X f ). 

Now for our code either this equation or Eq. d22t is valid, and so we have shown that for every u, 5 < uj < 1 
there exists A < uj such that 

F < max(-(3(uj) - A(u), B(uj, A) - A(X)). 
This completes the proof. □ 

3. More on the bound of Theorem © 

In this section we take a closer look at the bound (fl8l with the aim to show that it provides a new segment 
of code rates where the BSC channel reliability is known exactly. We rely on the notation of Sect. II. II Let 
R x = 1 — h(2p(l — p)). Recall that the best known lower bound on E(R, p) below the critical rate is given 
by 

(23) E x (R,p) = -A(5 G v{R)) 0<R<R x 



(24) E (R,p) = D(p\\p) + i? crit - R R X <R< R ciit . 

For R > R crit the reliability function E(R,p) = E sp (R,p). Note that both E x and E sp (R,p) can be viewed 
as instances of the union bound and that both are tangent on Eo(R,p). Let us make one simple observation 
showing that the bound ( 11 8b has the same property. 
The following lemma is verified by direct calculation. 

Lemma 16. Let 5\ = 2p(l — p) and let R\ = R(5i). Then 

-A(5 X ) -Rx + 1- hiSt) = E Q (R u p). 

Proof. Indeed, d24l can be rewritten as 

E (R,p) =1-R + log(l + 2 y / p(l-p)). 

The equality in the statement is equivalent to the relation 

h(6!) + Si log 2Vp(1-p) = log(l + 2 v / p(l-p)) 
which is an easily verifiable identity. □ 

Next we can prove the main result of this section. 

Theorem 17. Let p, 0.046 < p < 1/2 be the channel transition probability. Then the channel reliability 
E(R,p) equals the random coding exponent Eq(R,p) for R\ < R < Rent- 
Proof We check numerically that R\ < Rq for p > 0.046. Thus, by Theorem for these values of p we 
have E(R%,p) = Eq(Ri,p). The full claim follows from the straight-line bound of Shannon, Gallager, and 
Berlekamp J27). □ 
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FIGURE 3. Bounds on the error exponent for the BSC with p = 0.08. In the interval 
Rl < R < -Rent the random coding bound Eq(R,p) is tight. 

Remark. We have seen in Lemma 0] that for p > 0.037, it suffices to rely on the simple form of the 
function R(x), namely R(x) = ip(x). Thus the only numerical calculation involved in the proof of this 
theorem relates to the function B{uj,5). 

The random coding exponent Eq(R,p) gives the best known lower bound on E(R,p) for R x < R < 
-R cr it. The fraction of this segment in which Theorem^lshows it to be tight is given by 



This fraction equals about 1/3 for p = 0.05 and tends to one as p — > 1 /2. 

We give an example of the new picture for the E(R, p) function in Fig.|3] Previously the reliability of the 
BSC was known exactly only for R > i? cnt fl2l . 



The inequality of Theorem |6] can be used for a code with an arbitrary distance distribution. In this 
section we are interested in the estimate of the error exponent for a random linear code C. Here by a 
random code we mean a binary code whose weight distribution behaves as the binomial distribution: B wn = 
exp[n(i? + h(cu) — 1)]. The reason for calling this code random is that the weight distribution of a randomly 
chosen linear code with high probability converges to the binomial distribution (e.g. 1131). 

The error exponent E(R,p) for random linear codes for low rates is bounded below by the expurgation 
exponent: E(R,p) > —A(5gv(R))- For R x < R < R a -i t , the exponent E(R,p) > E (R,p). Moreover, it 
is known that the error probability P e (C) averaged over the ensemble of all binary codes meets this bound 
with equality [15|. The proof of this result in [15| is accomplished by computing the ensemble average 
probability of error under list decoding into lists of size 2, where by error we mean the event that the 
transmitted codeword is not in the resulting list. It turns out that under this definition the error occurs in 
an exponentially smaller fraction of cases than the error of maximum likelihood decoding. In other words, 
in all the cases of error under maximum likelihood decoding (i.e., decoding into a size-1 list) except for an 
exponentially small fraction of them, there is exactly one codeword which is at least as close to the received 
word as is the transmitted word. This shows that for exponential asymptotics of the error probability of 




4. Random linear codes 
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random codes the union bound is tight. An analogous result can also be proved for the ensemble of binary 
linear codes. 

Here we compute a lower bound on the decoding error probability of a code with weight distribution B wn . 
A closed-form expression again seems beyond reach, however computational evidence with the bound ( fTTt 
suggests that in a certain segment of code rates < R < R** , the error exponent of maximum likelihood 
decoding of the code C is bounded above as follows 

E(R,p) < -A(S Gy (R)). 

In other words, the expurgation exponent is tight for a random linear code in the region of low code rates. 

5. The Gaussian channel 

Given the results for the BSC of Section |3j it is natural to assume that qualitatively similar results hold 
for the reliability function of the Gaussian channel. Here we consider briefly this problem and show that the 
random coding exponent is tight for a certain interval of rates immediately below the critical rate. As in the 
binary case, the length of this segment depends on the level of the channel noise. 

Let a be the signal-to-noise ratio in the channel. Denote by E(R, a) the channel reliability function de- 
fined analogously to the BSC case. It is known to be bounded below by the random coding bound Eq(R, a) 
ll26l which has the form 

E (R,a) = ~(1 - cos 9 X )+R X -R 
and is the best known lower bound for R x < R < i? cr j t where 

1/11 / a?\ 

9 X = cos -1 \/ 1 — e~ 2Rx , 

1, /l a 1 r a2~\ 

R crit = - In -H h -\/H . 

2 V2 4 2V 4 / 

Let C be a code on S ,n ~ 1 (M) (the unit sphere in W 1 ). Let 9(xi, Xj) be the angle between the vectors that 

correspond to the codewords Xi,Xj. Denote by B{6) the distribution of angular distances in the code C. 

The exponent of the union bound on the error probability P e {C, a) has the form 

E v = -(1 -cos0) - - In 5(0). 
4 n 

Used together with an estimate of the distance distribution of a code of rate R obtained in [2] this bound 
takes the form 

Eu(R,a) = |(l-cos£) - In sine -12, 

where 6 = 9{R) is the root of the equation R = ip{9) and 

1 — sin x 1 — sin x 2 sin x 

ip(x) = In In 

2 sin x 1 + sin x 1 + sin x 

(which represents the Kabatiansky-Levenshtein bound on spherical codes). The strongest known condi- 
tion for the union bound to be valid asymptotically as a lower bound on P e (C,a) was announced in [5|. 
According to it, E(R, a) < Eu{R, a) for all rates R < R*, where R* is the root of 

(25) 22 + la sin 0(12) = §(1- cos 0). 

8 

Other conditions were obtained in |2JE1[9). 

Next we state a result analogous to Lemma [T6l Its proof is immediate by comparing the expressions for 
Eu and E . 

Lemma 18. Let Ri = if>(9 x ), then E (Ri,a) = Eu{R\,a). 
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We conclude that Eq(R\, a) is the correct value of E(R\, a) if R\ < R* . The last inequality holds for 
< a < 5.7. Coupled with the straight-line principle of [27] this gives 

Theorem 19. Let < a < 5.7 be the signal-to-noise ratio in the channel. Then 

E(R, a) = E (R, a) (Ri < R < R c ). 

Example. For instance, let a = 2. Then R x = 0.094, R x = 0.199, R* = 0.263, R crit = 0.267. 

If instead of ( 125 1 we rely on conditions with a published proof, we would still be able to make a tightness 
claim of Eq but for a smaller segment of the signal-to-noise ratio values. 

Postscriptum: Recently, a generalized de Caen inequality was used to derive lower estimates of error 
probability of a code via its distance distribution [9]. In particular, [9| gives a condition for the union bound 
to be valid asymptotically as a lower bound on P e in the BSC case. Although the condition is stated as an 
optimization problem ([9], Prop. 5.3), computational evidence suggests that its solution is given by dl6t . 
Thus, the methods of this paper and of [9], although different in nature, seem to lead to the same general 
estimates. Note that [9] does not contain results on the BSC reliability function. 
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